Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track

نویسندگان

Isao Namba

Nobuyuki Igata

چکیده

This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization, and reranking by bi-gram extraction from pilot search. The e ect of blind application with those techiniques is rather limited, or even uncertain in the TREC8 experiment. What we can say from TREC8 result is that blind application of co-occurence boosting and area weighting may be e ective for the small web track. They requerie query dependent application. In the large web track, our main interest is efciency, that is how much resources are required to process 100GB of web text and 10000 real web queries in practical time. Using a statistical based language type checker, we can eliminate 23% of nonEnglish text. This leads to speeding up a indexing and reducing the index size. The search speed for an inverted le is CPU intensive if the target machine has main memory in excess of 10-25% of the index size. So with simple, but e ective index compression methods, the throughput of query processing is about 0.54-1.1 query/second even by a single 300MHz Ultra-sparc processor. 1 System Description

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing

متن کامل

PLIERS at TREC8

The use of the PLIERS text retrieval system in TREC8 experiments is described. The tracks entered for are: Ad-Hoc, Filtering (Batch and Routing) and the Web Track (Large only). We describe both retrieval efficiency and effectiveness results for all these tracks. We also describe some preliminary experiments with BM_25 tuning constant variation.

متن کامل

An Early DiscoWeb Prototype at TREC8

Recently the notion of popularity and its generalizations have been investigated as a possible alternative approach to text only analysis to rank web pages in search engines (e.g. [Kle98, BP98, CDR98, CDDG98, BH98, HHMN99] among others). We have built a research prototype that incorporates many link analysis algorithms from the literature and also new algorithms to investigate the impact of the...

متن کامل

Fujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing

This year a Fujitsu Laboratory team participated in web tracks. For TREC9 we experimented passage retrieval which is expected to be e ective for Web pages which contain more than one topic. To split document into passages, we used NLP based paragrah detecting program, not by xed (variable) window size. But it did not produce better result for TREC9 Web data. For indexing large web data faster, ...

متن کامل

Fujitsu Laboratories TREC2001 Report

This year a Fujitsu Laboratory team participated in web tracks. Both for ad hoc task, and entry point search task, we combined the score of normal ranking search and that of page ranking techniques. For ad hoc style task, the eect of page ranking was very limitted. We only got very little improvement for title eld search, and the page rank was not eective for description, and narrative eld sear...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track

نویسندگان

چکیده

منابع مشابه

Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing

PLIERS at TREC8

An Early DiscoWeb Prototype at TREC8

Fujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing

Fujitsu Laboratories TREC2001 Report

عنوان ژورنال:

اشتراک گذاری